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FILES ON DISTRIBUTED NETWORKS 
BACKGROUND OF THE INVENTION 

The present invention relates to a data storage/access 
in a client-server system which consists of a plurality of 
hosts each of which may act as either a server or clients 
and which are interconnected by a shared communication 
channel . 

Research and development have been achieved on a 
server with a storage device for storing a number of files, 
such as a movie. The server distributes these files upon a 
demand from a client, 

A video server system needs extension due to lack of 
capacity of server computers, it has been solved by 
replacing the old ones with a higher performance server 
computer, or by increasing the number of server computers 
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1 SO that a load of processing is distributed over a 

2 plurality of server-computers. The latter way of extending 

3 the system by increasing the number of server computers is 

4 effective in terms of workload and cost- A video server as 

5 such is introduced in "A Tiger of Microsoft, United States, 

6 Video on Demand" in an extra volume of Nikkei 

7 Electronics titled "Technology which underlies Information 

8 Superhighway in the United States", pages 40, 41 published 

9 in Oct. 24, 1994 by Nikkei BP. 

10 A server system includes a network and server- 

11 computers. The server-computers are connected to the 

12 network and have a function as a video server, magnetic 

13 disk unit which are connected to the server computers and 

14 stores video programs, clients which are connected to the 

15 network and demand the server computers to read out a video 

16 program. Each server computer has a different plurality of 

17 , set of video programs such as a movie stored in the 



magnetic disk units. A client therefore reads out a video 
program via one of the server-computers which has a 
magnetic disk units where a necessary video program is 
stored. The server system in which each one of a plurality 
of server-computers stores an independent set of video 
programs. The server system is utilized efficiently when 
each demand on a video program is distributed to different 
server computers. However when a plurality of accesses 
rush into a certain video program, a work load increases on 
a server computer where this video program is stored, 
namely a work load disparity will be caused among server 
computers. Even if the other server computers remain idle, 
the whole capacity of the system has reached to the utmost 
level because of the overload on a capacity of a single 
computer. This deteriorates the efficiency of the server 
system. 

U. S. Patent No. 5,630,007 teaches a client-server 
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system which includes a plurality of servers and a 
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plurality of storage devices. The storage devices 
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sequentially store data. The data is distributed in each 
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of the plurality of storage devices. Each server device is 
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connected to the plurality of storage devices for accessing 
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the data distributed and stored in each of the plurality of 


"'=4 

t 3 


7 


storage devices. The client-server system improves 
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efficiency of each server by distributing loads to a 
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plurality of servers. The client-server system also 
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includes an administration apparatus. The administration 
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apparatus is connected to the plurality of servers for 


12 


administrating the data sequentially stored in the 
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plurality of storage devices and the plurality of servers. 
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A client is connected to both the administration apparatus 




15 


and the plurality of servers. The client specifies a 




16 


server that is connected to a storage device where a head 
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block of the data is stored by inquiring to the 
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administration apparatus and accesses the data in the 
plurality of servers according to the order of the data 
storage sequence from the specified server • The client 
makes an inquiry to the administration apparatus and 
accesses the data in the plurality of servers in accordance 
to the order of the data storage sequence from the 
specified server. 

U, S. Patent No. 5,905,847 teaches a client-server 
system which improves efficiency of each server by 
distributing loads to a plurality of servers having a 
plurality of storage devices. The storage devices 
sequentially store data. The data is distributed in each 
of the plurality of storage devices. Each server is 
connected to the plurality of storage devices for accessing 
the data distributed and stored in each of the plurality of 
storage devices. An administration apparatus is connected 
to the plurality of servers for administrating the data 



sequentially stored in the plurality of storage devices and 
the plurality of servers • A client is connected to both 
the administration apparatus and the plurality of servers. 
The client specifies a server which is connected to a 
storage device in which a head block of the data is stored 
by making an inquiry to the administration apparatus and 
accesses the data in the plurality of servers in accordance 
to the order of the data storage sequence from the 
specified server. 

U. S, Patent No. 5,926,101 teaches a multi-hop 
broadcast network of nodes which have a minimum of hardware 
resources, such as memory and processing power. The 
network is configured by gathering information concerning 
which nodes can communicate with each other using flooding 
with hop counts and parent routing protocols. A 
partitioned spanning tree is created and node addresses are 
assigned so that the address of a child node includes as 



1 its most significant bits the address of its parent. This 

2 allows the address of the node to be used to determine if 

3 the node is to process or resend the packet so that the 

4 node can make complete packet routing decisions using only 

5 its own address. 

6 U. S. Patent No. 6,108,703 teaches a network- 

7 architecture which has a framework. The framework supports 

Q 

. 8 hosting and content distribution on a truly global scale. 

r z 

sj 9 The framework allows a content provider to replicate and 

In 10 serve its most popular content at an unlimited number of 

m 

!H 11 points throughout the world. The framework includes a set 

fU 12 of servers operating in a distributed manner. The actual 

13 content to be served is preferably supported on a set of 

14 hosting servers (sometimes referred to as ghost servers) . 

15 This content includes HTML page objects that are served 

16 from a content provider site. A base HTML document portion 

17 of a Web page is served from the content provider's site 
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while one or more embedded objects for the page are served 
from the hosting servers, preferably, those hosting servers 
near the client machine. By serving the base HTML document 
from the content provider's site, the content provider 
maintains control over the content. 

U, S, Patent No, 5,367,698 teaches a networked digital 
data processing system which has two or more client devices 
and a network. The network includes a set of inter- 
connections for transferring information between the client 
devices. At least one of the client devices has a local 
data file storage element for locally storing and providing 
access to digital data files arranged in one or more client 
file systems. A migration file server includes a migration 
storage element that stores data portions of files from the 
client devices, a storage level detection element that 
detects a storage utilization level in the storage element, 
and a level-responsive transfer element that selectively 
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transfers data portions of files from the client device to 
the storage element. 

U. S. Patent No. 5,802,301 teaches a method for 
improving load balancing in a file server. The method 
includes the steps of determining the existence of an 
overload condition on a storage device having a plurality 
of retrieval streams, accessing at least one file thereon, 
selecting a first retrieval stream reading a file, 
replicating a portion of the file being read by the first 
retrieval stream onto a second storage device and reading 
the replicated portion of the file on the second storage 
device with a retrieval stream capable of accessing the 
replicated portion of the file. The method enables the 
dynamic replication of data objects to respond to 
fluctuating user demand. The method is particularly useful 
in file servers such as multimedia servers delivering 
continuously in real time large multimedia files such as 



movies , 

U. S, Patent No. 5,542,087 teaches a data processing 
method which generate a correct memory address from a 
character or digit string such as a record key value and 
which is adapted for use in distributed or parallel 
processing architectures such as computer networks, 
multiprocessing systems, and the like. The data processing 
method provides a plurality of client data processors and a 
plurality of file servers. Each server includes at least a 
respective one memory location or "bucket". The data 
processing method includes the steps of generating a key 
value by means of any one of the client data processors and 
generating a first memory address from the key value. The 
first address identifies a first memory location. The data 
processing method also includes the steps of selecting 
from the plurality of servers a server that includes the 
first memory location, transmitting the key value from the 

10 



1 one client to the server that includes the first memory 

2 location and determining whether the first address is the 

3 correct address by means of the server. The data 

4 processing method further provides that if the first 

5 address is not the correct address then performing the 

6 steps of generating a second memory address from the key 

7 value by means of the server, the second address 

8 identifying a second memory location, selecting from the 

9 plurality of servers another server which includes the 

10 second memory location, transmitting the key value from the 

11 server that includes the first memory location to the other 

12 server which includes the second memory location, 



13 determining whether the second address is the correct 

14 address by means of the other server and generating a third 

15 memory address, which is the correct address, if neither 

16 the first or second addresses is the correct address. The 

17 data processing method provides fast storage and subsequent 
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searching and retrieval of data records in data processing 
applications such as database applications - 

Distributed storage and sharing of data and program 
files has become an integral part of doing business over 
the Internet and other distributed networks. Such a 
distributed environment is characterized by the fact that 
multiple copies of the same file reside over the network. 

In peer-to-peer networking each user also doubles as a 
server connected to the Internet. Service providers, such 
as Napster, Gnutella and Freenet have emerged. This 
emerging technology has the potential to revolutionize 
Internet and E-Commerce, but several technological 
challenges have to be overcome before it can be translated 
into a robust product which hundreds of millions of 
customers can reliably use. 

The most frequent use of such a network is for 
downloading purposes. A client looks up the content list, 

12 



and wants to download a particular file/content from the 
network. The existing protocols for this process are 
extremely simple and can be described in general as 
follows. The client or a central server searches the list 
of servers that contain the desired file, and picks one 
such server (either randomly or according to some priority 
list maintained by the central server) and establishes a 
direct connection between the client requesting the down 
load and the chosen server. This connection is maintained 
until the entire file has been transferred. The exact 
implementation might vary from one protocol to another; 
however, the fact that only one server is picked for the 
transfer of the entire requested file remains invariant. 
The above-mentioned existing protocols suffer from 
several serious drawbacks, as stated next. Since only one 
server is picked for the transfer of the entire file (even 
though there are potentially many servers with the same 

13 
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1 content) , the quality of service becomes totally dependent 

2 on the bandwidth and the reliability of the Internet access 

3 that the chosen server maintains during the transfer. This 

4 poses a serious problem, especially in the case of networks 

5 that primarily comprise of low-performance servers as is 

6 the case for Napster and other proposed peer-to-peer 

7 networks and the reliability and speed of the host 

8 computers cannot be guaranteed. The average available 

9 bandwidth could be as low as that of a 28. 8K or a 56K 

10 modem. Moreover, the connection of the server to the 

11 Internet could be dropped in the middle of a download, 

12 necessitating another attempt from the beginning. For 

13 example, an average MP3 file is around 5 Mega-bytes in 

14 length, and it will take around 16-20 minutes to download 

15 it over a 56K modem! ! If the connection is dropped at any 

16 time during this period, then one needs to attempt the 

17 download all over again. The issue of choosing the best 

14 



server among those that have a copy of the requested file 
is not properly addressed, leading to a further loss in the 
quality of the service. If the winner is picked randomly 
then clearly it is not the best choice. Even if the winner 
is picked based on a pre-sorted list, where servers are 
ranked according to their average available bandwidth, the 
resulting scheme would be far from optimal. In particular, 
even if a server has a higher average bandwidth, since it 
comprises only a part of the host computer and shares the 
bandwidth with other competing tasks, the available 
bandwidth for the download could be drastically low during 
the time of the transfer. The protocols do not take 
advantage of the fact that the client could have a much 
higher available bandwidth than any of the potential 
servers. For example, even if the client is connected to a 
high-speed Ethernet, the effective transfer rate for the 
session could still be as low as that of a modem that the 

15 
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chosen server might be using. Accuracy and integrity of 
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the downloaded file are not usually guaranteed. Since 
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luultiole coDies of the files are inaintainpd bv differpnt 




4 


servers the issue of the integrity of the downloaded files 
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becomes a serious concern. 
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SUMMARY OF THE INVENTION 
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The oresent invention is aenerallv directed to a 
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distributed network which includes a Dluralitv of hosts and 
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a shared communication channel. Each hosts is coupled to 
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the shared communication channel • Each host acts as both a 
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client and a server. 
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In a first separate aspect of the present invention. 




15 


the distributed network is used to incast fragments from 




16 


multiple copies of a file in order to be gathered together 




17 


so that a single copy of the file can be generated. 
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1 In a second separate aspect of the present invention, 

2 at least one host has a global list with entries. Each 

3 entry contains all the necessary information about a file, 

4 The features of the present invention which are 

5 believed to be novel are set forth with particularity in 

6 the appended claims. 

f-. 7 DESCRIPTION OF THE DRAWINGS 

hs7 

y 8 Fig. 1 is a schematic diagram of a video server system 

y ■ 

9 of the prior art. 
ill 10 Fig. 2 is a schematic diagram of a video server system 

^ 11 of U. S. Patent No. 5, 630,007. 
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12 Fig. 3 is a schematic diagram of an administration 

13 table according to U. S. Patent No. 5,630,007. 

14 Fig. 4 is a schematic drawing a distributed network 

15 which has a plurality of hosts according to the present 

16 invention wherein each host acts as both a client and a 

17 server. 
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Fig, 5 is a schematic drawing of a file format for use 
in the distributed network of Fig. 4. 

Fig. 6 is a schematic drawing of an entry for a file 
in a global list in which the entry contains all the 
necessary information about the file so that a client can 
successfully complete an incasting process using the 
distributed network of Fig, 4, 
DESCRIPTION OF THE PREFERRED EMBODIMENT 

Fig. 1 is a video server system of the prior art 
includes a network 1 and server computers 2. The server 
computers 2 are connected to the network 1 and have a 
function as a video server, magnetic disk unit 3 which are 
connected to the server computers 2 and stores video 
programs, clients 5 which are connected to the network 1 
and demand the server computers 2 to read out a video 
program- Each server computer 2 has a different plurality 
of set of video programs such as a movie stored in the 
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1 


magnetic disk units 3. A client 5 therefore reads out a 




2 


video program via one of the server computers 2 which has a 




3 


magnetic disk units 3 where a necessary video program is 




4 


stored. 




5 


Referring to Fig. 2 in conjunction with Fig. 3 a video 




6 


server system 10 of U. S. Patent No. 5,630,007 includes a 




7 


network 11, such as Ethernet and ATM, and a plurality of 




8 


server computers 12. Application oroarams are connprtpd to 


%A . 
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the network 11. Magnetic disk units 31 and 32 are 
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connected to the server computers which sequentially store 
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distributed data, such as a video program, which has been 




12 


divided (referred to as "striping") to be stored in the 




13 


magnetic disk units 31 and 32, client computers 5 which are 


0 ■ 


14 


connected to the network 1 and receive video program. 




15 


application programs which operate in the client computers 




16 


5, driver programs as an access demand means which demand 




17 


access to the video program 4 having been divided and 
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sequentially stored in magnetic disk units 31 and 32 in 
response to a demand to access from application programs - 
Client-side network interfaces carry out such process as 
TCP/IP protocol in the client computers 5 and realize 
interfaces between clients and the network 1, server-side 
network interfaces which carry out such processes as TCP/IP 
protocol in the server computers 2 and realizes interface 
between servers and the network 1, server programs which 
read data block out of magnetic disk units 31 and 32 to 
supply it to the server-side network interfaces the 
original video program 11 which has not yet been divided 
nor stored, administration computer 12 connected to the 
network 1, administration program 13 operated in the 
administration computer 12 which administrates the video 
program having been divided and stored in magnetic disk 
units 31 and 32 and the server computers 2. The 
administration computer-side network interface carries out 

20 
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1 such process as TCP/IP protocol in the administration 

2 computer 12 and realizes an interface between the 

3 administration computer 12 and the network 1, and a large 

4 capacity storage 15 such as CD-ROM, which is connected to 

5 the computer 12 and the original video program 11 is stored 

6 therein. 

7 Still referring to Fig. 2 only two magnetic disk units 

8 are connected to each server computer. Each of the three 

9 server computers 2 is connected to two magnetic disk units, 

10 respectively, and also connected to the administration 

11 computer 12 and a plurality of client computers 5 which are 

12 devices on the video-receiving side, via the network 1. 

13 Each magnetic disk unit 31 or 32 is divided into block 

14 units per a certain amount. Six video programs, denoted by 

15 videos 1-6 are stored in 78 blocks denoted by blocks 0-77. 

16 Each video program is stored as if data was striped where 

17 data has been divided and distributed over the plurality of 
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1 the magnetic disk units 31 and 32. Video 1 is sequentially 

2 stored in the blocks 0-11, and video 2 is sequentially 

3 stored in the blocks 12-26. Videos 3-6 are also stored in 

4 the blocks, respectively. 

5 Referring to Fig. 4 a distributed network 110 includes 

6 a plurality of hosts 111 and a shared communication channel 

7 112. Each host is coupled to the shared communication 

8 channel 112. Each host 111 may act as both a client and a 

9 server and uses the distributed network 110, but not all of 

10 the hosts need to act as either a client or a server. The 

11 downloading process may be called incasting because it can 

12 be construed as a reverse of broadcasting. In 

13 broadcasting, a file 120 is transmitted to multiple 

14 locations generating multiple copies of the file 120. In 

15 contrast, in incasting fragments 121 of multiple copies of 

16 the file 120 are gathered together to generate a single 

17 copy of the file 120. There is a format for creating and 

22 



%J 



1 storing multiple copies of the files 120 and a protocol to 

2 guarantee fast in the sense that it utilizes the maximum 

3 available bandwidth for the task and accurate transfer of 

4 the requested content/file 120 to a client in the sense 

5 that the content of the copied file 120 is the same as that 

6 of the stored one. Incasting would constitute the backbone 

7 of the distributed network 110. 

8 Incasting addresses a key technological issue of how 

\^ 9 to provide a high-quality service in terms of both accuracy 

10 and speed for transferring a file 120, which a client has 

11 requested, to the client on the distributed network 110 

12 that support content replication. The same content or file 

13 120 can reside in several different servers on the 

14 distributed network 110. This could be either because the 

15 file 120 was created at only one server and then 

16 distributed to several others or because the same content 

17 was created or procured independently at different servers. 

23 



1 Incasting will work even if no individual server has 

2 the complete file 120, but as long as the complete file 120 

3 is collectively available on the whole distributed network 

4 110, There is a unique identification tag for each content 

5 or file 120 residing on the network, A list of all 

6 accessible content/files 120 is either available from one 

7 central server or is maintained in a distributed manner. 

8 Several servers may contain a complete or partial lists of 

9 the contents. Such a list would contain the identification 
ir§ 10 tags of all the contents. For each content/file 120 it 

" 11 would list all the servers that contain a copy of the file 

^ 12 12 0. 



13 Referring to Fig. 5 the file 120 is divided into a 

14 number of segments 121. Each segment 121 has a secure hash 

15 function. The secure hash function is used to compute a 

16 message digest, which is then signed. The number of 

17 segments 121, their locations, the hash function (s) and the 
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public key(s) for the digital signatures are recorded as 




2 


attributes of the file 120. 




3 


The incasting process will work for any existing 




4 


format for storing files 120 which follows the convention 




5 


of being byte aligned. Hence, any server can handle a 




6 


request, where it is asked to transmit blocks of bytes 




7 


along with start and end indices* For example, a typical 


i = 


8 


request could be for the transmission of M bytes of a file 




9 


120 starting at the kth byte. However, for guaranteeing 


in 

'm 
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the integrity of the files 120 and for avoiding expensive 




11 


retransmissions of potentially erroneous downloads, the 


fU 
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12 


following format for storing files 120 and partitioning the 


I 5 


13 


file 120 into a specified number of segments 121 is 




14 


recommended. For each segment 121, compute a message 




15 


digest of the contents using a secure hash function. The 




16 


message digest basically acts as a unique identifier for 




17 


the contents of the segment 121 and on reception, can be 
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used to guarantee the integrity of the contents of the 
segment 121. In order to guarantee authenticity (e,g,/ the 
fact that the file 120 was indeed created by the owner) , 
one can in addition sign the digest. Thus, if one has the 
segment 121, the message digest and the digital signature 
of the file 120, then one can verify authenticity (check 
that the signature matches the digest) and then check for 
integrity (i.e., the digest matches the contents of the 
segment 21) . For example, the Secure Hash Standard (SHS) 
can be used to generate 160-bit message digests for the 
segments 121. The Digital Signature Standard (DSS) can 
then be used to generate a 320-bit digital signature of the 
digest. Other standard hash functions (e.g., MD4 and MD5) 
and digital signature schemes (e.g., those based on RSA) 
can be used as well. The number of segments 21 and their 
starting locations can be stored in the file description. 
Moreover, if the feature of digital signature is used, then 

26 
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1 the public key(s) of the owner of the file 20 and the hash 

2 function used should also be made available in the 

3 description of the files 120. 

4 Referring to Fig. 6 each entry for a file 120 in a 

5 global list 130 contains all the necessary information 

6 about the file 120 so that a client can successfully 

7 complete an incasting process. The client wishing to 

8 download a file 120 goes through the following step of 

9 searching the distributed network 110. The client first 

10 searches the global list(s) 130 of content/files 120 (to be 

11 referred to as the network directory from hereon) to 

12 determine the availability of the desired file 120 on the 

13 distributed network 110. It is not necessary that a global 

14 network directory be maintained at one or several servers. 

15 The network directory could itself be maintained in a 

16 distributed fashion (e.g., the scheme adopted in the 

17 Gnutella network) in which case, a distributed search for 
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the desired content/file 120 will be carried out. In both 




2 


caseS/ the following information is returned to the client. 




3 


A list of (IP) addresses for the servers where the file 120 




4 


is located partially or in full. If a server has only 




5 


parts of the desired file 120, then a succinct description 




6 


(e.g., start and end byte numbers of contiguous portions of 




7 


the file 120) of the content stored in the server is also 


kQ ■ 


8 


included. If the file 120 is divided into segments 21 


ty 


9 


along with corresponding digest and digital signature, then 




10 


the client will also receive descriptions of the segments 


f 

i — 
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21, and the types of hash functions and public key(s) used 


ru 

k' — 


12 


for the digital signature. The client now has all the 




13 


storage information about the desired file 20, but does not 


O" 


14 


know the exact availability of bandwidth at the eligible 




15 


servers for any download request. Using an adaptive 




16 


incasting algorithm the client is able to virtually 




17 


segments the file 120 into a number of distinct parts and 
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requests each part from a distinct server. The exact 




2 


nature of the virtual segmentation procedure will depend on 




3 


a number of factors, including, the bandwidth available to 




4 


the client, any prior knowledge about the bandwidth 




5 


available to different servers and also the storage format 




6 


of the requested file 120. Since, these are all very 




7 


implementation-dependent, specific details of the virtual 


'\. i 
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segmentation procedure are not provided. Different servers 


''J 


9 


will respond at different time intervals to the above- 




10 


mentioned requests. For example, the servers that have 




11 


high available bandwidth will respond faster than those 


t. 

it" a J 


12 


with slower access, and some servers might not respond at 


y 


13 


all. The client can then have an online estimate of the 




14 


traffic and can change the frequency and size of the 




15 


requests adapt ively. Some servers that do not respond 




16 


during a pre-specif led time interval could dropped from the 




17 


list altogether or could be tried again after an interval 
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1 of time, if the other active servers are not fast enough. 

2 This scheme allows complete flexibility and can be used to 

3 saturate the available bandwidth of the client. As the 

4 above-mentioned adaptive protocol is carried out, the 

5 desired file 20 is received in contiguous chunks of bytes. 

6 Since the segmentation format of the file 120 is known to 

7 the client, it can always check whether any complete 

8 segment 21 of the file 120 has been downloaded or not. 

9 Once a full segment 121 of the file 120 is downloaded, it 

10 can first verify authenticity of the message digest using 

11 the digital signature and the public key and then verify 

12 the accuracy/integrity of the segment 121 by comparing the 

13 downloaded message digest with a digest that it computes on 

14 the content of the segment 121 (using a pre-specif led hash 

15 function) . If any of these verification procedures fails, 

16 then it discards the whole segment 121 and starts the 

17 requests for the bytes in that segment 121 again. Clearly, 
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there is a tradeoff here between the number of original 




2 


segments 121 in the file 120 and the number of bytes that 




3 


might be downloaded multiple times. If there are more 




4 


segments 121 in the file 120, then first the chance that a 




5 


segment 121 is corrupted is small, and second even if some 




6 


bytes are corrupted then only a small number of bytes will 




7 


need to be downloaded again. However, more segments 121 




8 


would mean a larger overhead in terms of the total size of 




9 


the file 120. For example, if the Digital Signature 
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Standard is used, then each segment 121 has to have at 


' iii. 


11 


least an additional 60 bytes: 160 bits (20 bytes) for the 




12 


message digest and 320 bits (40 bytes) for the digital 




13 


signature . 




14 


Incasting allows a client to efficiently download a 




15 


file 120 from the distributed network 110 by putting 




16 


together fragments of the file 120 obtained from different 




17 


servers that maintain partial or complete copies of the 
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1 desired file 120. While the well-known broadcasting 

2 procedure creates copies of the same file 120 at many 

3 different destination servers incasting recreates a copy of 

4 the file 120 by optimally piecing together fragments of the 

5 file 120 obtained from multiple target servers. Incasting 

6 provides both a suitable format for storing the files 120 

7 and a protocol for gathering the distributed content to 

8 create an accurate copy. The same content/file 120 can 

9 reside in several different servers on the distributed 

10 network 110. This could be either because, the file 120 

11 was created at only one server, and then distributed to 

12 several others, or because the same content was created or 

13 procured independently at different servers. In fact, our 

14 invention will work even if no individual server has the 

15 complete file 120, but as long as the complete file 120 is 

16 collectively available on the whole distributed network 

17 110. There is a unique identification tag for each content 
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or file 120 residing on the network, A list of all 
accessible content/files 120 is either available from one 
central server, or is maintained in a distributed manner 
(i.e., several servers contain the complete or partial 
lists of the contents) . Such a list would contain the 
identification tags of all the contents, and for each 
content/file 120 it would list all the servers that contain 
a copy of the file 120. 

The most frequent use of the distributed network 10 is 
for downloading purposes. A client looks up the content 
list, and wants to download a particular content/file 20 
from the distributed network 10. The existing protocols 
for this process are extremely simple, and can be described 
in general as follows. The client or a central server 
searches the list of servers that contain the desired file 
20 and picks one such server (either randomly or according 
to some priority list maintained by the central server) and 
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1 establishes a direct connection between the client 

2 requesting the down load and the chosen server. This 

3 connection is maintained until the entire file 20 has been 

4 transferred. The exact implementation might vary from one 

5 protocol to another; however, the fact that only one server 

6 is picked for the transfer of the entire requested file 120 

7 remains invariant, 

8 The distributed network includes a plurality of hosts 

9 and a shared communication channel. Each host has a 

10 storage device. U, S, Patent No, 5,630,007 teaches a 

11 distributed network which includes a plurality of servers 

12 with storage devices and a plurality of clients. In U, S. 

13 Patent No, 5,630,007 the servers are distinct from the 

14 clients. In this invention the clients and the servers are 

15 interchangeable. Each host may act as either a client or a 

16 server, A file is divided into a plurality of segments. 

17 Each segment is transmitted to the storage devices of 
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1 several of the hosts and stored in the storage device of 

2 the host. Each host is coupled to the shared communication 

3 channel. A host acting as a client requests that the other 

4 hosts acting as servers and collectively send all of the 

5 segments to the requesting client so that the requesting 

6 client can gather the segments together in order for the 

7 segments to self-assemble and generate a single copy of the 

8 file. At least one host has a global list with entries. 

9 Each entry contains all the necessary information about the 

10 file. 

11 From the foregoing it can be seen that incasting for 

12 downloading files 120 on distributed networks 110 has been 

13 described. 

14 Accordingly it is intended that the foregoing 

15 disclosure and drawings shall be considered only as an 

16 illustration of the principle of the present invention. 
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